knitr::opts_chunk$set(echo = TRUE)

Question and Background

Taylor Swift

Add album timelines and any information that could be relevant to her music

Spotify Audio Features

To fully capture Taylor’s evolution we wanted to consider both quantitative (audio features) and qualitative (natural language processing) aspects of her work. We hypothesized that we would see a progression in both her technical sound and the content of her songs as she pivoted from being a more acoustic, country artist to more of a pop artist.

To consider the technical sound aspects we used 11 quantitative audio features provided by Spotify: acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, and valence. For more information on these features click here

Natural Language Processing

Initial Exploraory Analysis - Song Metrics and Spotify Features

To consider Taylor’s music evolution we focused our attention on audio features we suspected would have changed the most from album to album: danceability, valence, energy, and length. Below are plots showing the change in the features over different albums.

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   ID = col_character(),
##   Name = col_character(),
##   Length = col_double(),
##   danceability = col_double(),
##   energy = col_double(),
##   key = col_double(),
##   loudness = col_double(),
##   mode = col_double(),
##   speechiness = col_double(),
##   acousticness = col_double(),
##   instrumentalness = col_double(),
##   liveness = col_double(),
##   valence = col_double(),
##   tempo = col_double(),
##   Release = col_double(),
##   Album = col_character()
## )

# Clustering Analysis

After developing a general sense of how Taylor’s audio features changed over time we wanted to investigate how similar her songs are through clustering analysis. Since our initial analysis showed very different mean values for danceability, energy, length, and valence versus release dates we hypothesized that song audio features would result in distinct clusters for each of the 9 albums considered.

Before clustering the data we first tried to determine the optimal number of clusters using the elbow graph below:

From the plot we can see that the plot begins to flatten out at k = 3. This is surprising as we had suspected that the data would cluster around the 9 albums. Additionally, we can see that even with 9 centers the clustering still had a relatively low explained variance of a little more than 0.5.

Using 3 Clusters

Using three centers and plotting energy vs. valence we can see the three distinct clusters. Cluster 1 is characterized by low valence and low energy; cluster 2 is high energy and high valence; and cluster 3 is lower valence but higher energy than cluster 1. From the color code we can see that clusters are not indicative of the albums and instead shows that these two qualities are distributed across multiple albums.

Similar to the previous plot, the clustering showed that energy and acousticness features are not album specific, but distributed across albums. The plot also helps show the non-linear nature of Taylor’s sound. For example, while her second album “Fearless” is highly acoustic her next album “Speak Now” is on the opposite of the graph. We can then see that Taylor went back to her earlier sound in “Red” which is also clustered high in acousticness with Fearless.

Next we decided to consider how each cluster varied from each other by creating bar charts of the grouped means values:

From the bar graphs, it seems that acousticness and Release are major factors which distinguish group 1, while valence, energy, and danceability are what distinguish group 2.

From the three center analysis, we can see that clustering using audio features does not seem to be great at distinguish different albums. To confirm this suspicion we will increase the number of centers to 9:

Using 9 Clusters

Using 9 centers and recreating the graphs from the three center analysis, we can see that it becomes even harder to distinguish the albums from each other. In each cluster we have multiple different albums with very different release dates.

Just like the three center clustering, the nine center clustering emphasizes how similar Fearless and Red were audio feature wise. Most importantly, this clustering shows that Taylor manages to vary both valence and energy across albums regardless of the release date.

Next the acousticness vs. energy plot was used using the 9 center clustering data. Once again, “Fearless” and “Red”are highly concentrated in cluster 6 which is characterized by lower energy and higher acousticness. However, we also see a variety of other albums like “1989”, “Taylor Swift”, and “Speak Now” in this cluster.

While not cluster specific, this plot also shows that her most recent albums (“Reputation”, “Lover”, “Evermore”, and “Folklore”) are lower in acousticness and higher in energy. Additionally, the album “1989” acts almost as a transition album between the two distinct zones.

Clustering Conclusion

Key takeaways:

  • Clustering using song audio features was not very insightful for distinguishing albums (also the explained variance was less than 0.6). This likely because Taylor has used the same producers throughout her career and therefore achieves a similar balance of features across each album.

  • From this we can see that sound of Taylor’s earlier albums (“Taylor Swift”, “Fearless”, “Speak Now”, “Red”) had the greatest fluctuation jumping between low energy and high acousticness and high energy and low acousticness.

  • Taylor’s album “1989” had the greatest variance across individual songs (in both clustering graphs) and acted almost as a transition album to her newer work which has concentrated in the higher energy and lower acousticness zone.

Sentiment Analysis

# Reading in data and setting it up for sentiment analysis
get_sentiments('afinn')
## # A tibble: 2,477 x 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # … with 2,467 more rows
get_sentiments('nrc')
## # A tibble: 13,875 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # … with 13,865 more rows
get_sentiments('bing')
## # A tibble: 6,786 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # … with 6,776 more rows
# Taylor Swift Album
ts <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/tswift"))
ts <- tibble(ts)
ts$ts <- as.character(ts$ts)
ts <- ts %>%
  unnest_tokens(word, ts)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# Fearless Album
fearless <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/fearless"))
fearless <- tibble(fearless)
fearless$fearless <- as.character(fearless$fearless)
fearless <- fearless %>%
  unnest_tokens(word, fearless)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# Speak Now Album
speak <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/speak_now")) 
speak <- tibble(speak)
speak$speak <- as.character(speak$speak)
speak <- speak %>%
  unnest_tokens(word, speak)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# Red Album
red <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/red"))
red <- tibble(red)
red$red <- as.character(red$red)
red <- red %>%
  unnest_tokens(word, red)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# 1989 Album
nineteen89 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/1989"))
nineteen89 <- tibble(nineteen89)
nineteen89$nineteen89 <- as.character(nineteen89$nineteen89)
nineteen89<- nineteen89 %>%
  unnest_tokens(word, nineteen89)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# Reputation Album
rep <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/rep"))
rep <- tibble(rep)
rep$rep <- as.character(rep$rep)
rep <- rep %>%
  unnest_tokens(word, rep)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# Lover Album
lover <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/lover"))
lover <- tibble(lover)
lover$lover <- as.character(lover$lover)
lover <- lover %>%
  unnest_tokens(word, lover)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# Folklore Album 
folklore <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/folklore"))
folklore <- tibble(folklore)
folklore$folklore <- as.character(folklore$folklore)
folklore <- folklore %>%
  unnest_tokens(word, folklore)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"
# Evermore Album 
evermore <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/evermore"))
evermore <- tibble(evermore)
evermore$evermore <- as.character(evermore$evermore)
evermore <- evermore %>%
  unnest_tokens(word, evermore)%>%
  anti_join(stop_words)%>% 
  count(word, sort=TRUE)
## Joining, by = "word"

Sentiment Ranges For Each Album

# TS Album
ts_affin <- ts %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = ts_affin, 
       aes(x=value)
        )+
  geom_histogram(color="seagreen", fill="powderblue")+
  ggtitle("Taylor Swift Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Fearless Album
fearless_affin <- fearless %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = fearless_affin, 
       aes(x=value)
        )+
  geom_histogram(color="burlywood4", fill="lightgoldenrod2")+
  ggtitle("Fearless Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Speak Now Album
speak_affin <- speak %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = speak_affin, 
       aes(x=value)
        )+
  geom_histogram(color="darkmagenta", fill="deeppink3")+
  ggtitle("Speak Now Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Red Album
red_affin <- red %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = red_affin, 
       aes(x=value)
        )+
  geom_histogram(color="red4", fill="indianred")+
  ggtitle("Red Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# 1989 Album
nineteen89_affin <- nineteen89 %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = nineteen89_affin, 
       aes(x=value)
        )+
  geom_histogram(color="blueviolet", fill="thistle2")+
  ggtitle("1989 Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Reputation Album
rep_affin <- rep %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = rep_affin, 
       aes(x=value)
        )+
  geom_histogram(color="gray19", fill="gray82")+
  ggtitle("Reputation Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Lover Album
lover_affin <- lover %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = lover_affin, 
       aes(x=value)
        )+
  geom_histogram(color="lightskyblue", fill="pink")+
  ggtitle("Lover Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Folklore Album
folklore_affin <- folklore %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = folklore_affin, 
       aes(x=value)
        )+
  geom_histogram(color="gray68", fill="gray93")+
  ggtitle("Folklore Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Evermore Album
evermore_affin <- evermore %>%
  inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = evermore_affin, 
       aes(x=value)
        )+
  geom_histogram(color="coral3", fill="navajowhite3")+
  ggtitle("Evermore Album Sentiment Range")+
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Word Clouds

# Taylor Swift Album
set.seed(42)
ggplot(ts[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "seagreen4", high = "turquoise3") + ggtitle("Taylor Swift Album")

# Fearless
set.seed(42)
ggplot(fearless[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "goldenrod", high = "burlywood4")+ ggtitle("Fearless Album")

# Speak Now
set.seed(42)
ggplot(speak[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "deeppink3", high = "darkmagenta") + ggtitle("Speak Now Album")

# Red
set.seed(42)
ggplot(red[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "indianred", high = "red4")+ ggtitle("Red Album")

# 1989
set.seed(42)
ggplot(nineteen89[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "mediumpurple1", high = "blueviolet")+ ggtitle ("1989 Album")

# Reputation
set.seed(42)
ggplot(rep[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "gray66", high = "gray19")+ggtitle("Reputation Album")

# Lover
set.seed(42)
ggplot(lover[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "palevioletred1", high = "lightskyblue")+ ggtitle("Lover Album")

# Folklore
set.seed(42)
ggplot(folklore[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "gray68", high = "gray55")+ggtitle("Folklore Album")

# Evermore 
set.seed(42)
ggplot(evermore[1:50,], aes(label = word, size = n, color = n)
       ) +
  geom_text_wordcloud() +
  theme_minimal() + scale_color_gradient(low = "navajowhite3", high = "lightsalmon2")+ ggtitle("Evermore Album")

Bing Analysis

# Bing Analysis
# TS Album
ts_bing <- ts %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(ts_bing$sentiment)
## 
## negative positive 
##       42       24
# neg 42 pos 24
# Fearless
fearless_bing <- fearless %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(fearless_bing$sentiment)
## 
## negative positive 
##       47       44
# neg 47 pos 44
# Speak Now
speak_bing <- speak %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(speak_bing$sentiment)
## 
## negative positive 
##       89       47
#neg 89 pos 47
# Red
red_bing <- red %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(red_bing$sentiment)
## 
## negative positive 
##       79       59
# neg 79 pos 59
# 1989
nineteen89_bing <- nineteen89 %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(nineteen89_bing$sentiment)
## 
## negative positive 
##       74       27
# neg 74 pos 27
# Reputation
rep_bing <- rep %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(rep_bing$sentiment)
## 
## negative positive 
##      112       53
#neg 112 pos 53
# Lover
lover_bing <- lover %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(lover_bing$sentiment)
## 
## negative positive 
##       99       55
#neg 99 pos 55
# Folklore
folklore_bing <- folklore %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(folklore_bing$sentiment)
## 
## negative positive 
##      103       35
# neg 103 pos 35
# Evermore
evermore_bing <- evermore %>%
  inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(evermore_bing$sentiment)
## 
## negative positive 
##       87       55
# neg 87 pos 55
# Creating a dataframe with the negative and positive values for each album and release dates 
negative <- c(42, 47, 89, 79, 74, 112, 99, 103, 87)
positive <- c(24, 44, 47, 59, 27, 53, 55, 35, 55)
album <- c("Taylor Swift", "Fearless", "Speak Now", "Red", "1989", "Reputation", "Lover", "Folklore", "Evermore")
release_date <- c(2006, 2008, 2010, 2012, 2014, 2017, 2019, 2020, 2020)
sentiment <- data.frame(album, release_date, negative, positive, stringsAsFactors=TRUE)
View(sentiment)
# Normalizing the values for pos and neg
normalize <- function(x){
  (x - min(x)) / (max(x) - min(x))
}
sentiment$negative <- normalize(sentiment$negative)
sentiment$positive <- normalize(sentiment$positive)
View(sentiment)
# Creating graph with just positive and negative values 
plot <- ggplot(sentiment, aes(x=positive, y=negative, color = `album`)) + geom_text(label=album) + ggtitle("Negative vs. Positive Sentiment of Albums") + theme_light()
plot

# Graphing values in 3D plot using 3 variables (neg, pos, and release date)
library(plotly)
fig <- plot_ly(sentiment, 
               type = "scatter3d",
               mode="markers",
               x = ~`release_date`, 
               y = ~`positive`, 
               z = ~`negative`,
               color = ~`album`,
               text = ~paste('Album:',album))
fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

Taylor Swift

ts_nrc <- ts %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
View(ts_nrc)
table(ts_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           17           24           14           18           28           34 
##     positive      sadness     surprise        trust 
##           48           21           16           28

Fearless

fearless_nrc <- fearless %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(fearless_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           22           25           10           25           38           57 
##     positive      sadness     surprise        trust 
##           67           27           19           46

Speak Now

speak_nrc <- speak %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(speak_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           33           41           19           50           48           81 
##     positive      sadness     surprise        trust 
##           74           50           32           49

Red

red_nrc <- red %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(red_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           32           36           23           39           40           68 
##     positive      sadness     surprise        trust 
##           84           36           20           44

1989

nineteen89_nrc <- nineteen89 %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(nineteen89_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           27           20           17           36           22           58 
##     positive      sadness     surprise        trust 
##           39           33           12           21

Reputation

rep_nrc <- rep %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(rep_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           49           32           32           62           45           99 
##     positive      sadness     surprise        trust 
##           79           47           26           43

Lover

lover_nrc <- lover %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(lover_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           43           44           25           63           45           89 
##     positive      sadness     surprise        trust 
##           76           44           25           50

Folklore

folklore_nrc <- lover %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(folklore_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           43           44           25           63           45           89 
##     positive      sadness     surprise        trust 
##           76           44           25           50

Evermore

evermore_nrc <- lover %>%
  inner_join(get_sentiments("nrc"))
## Joining, by = "word"
View(evermore_nrc)
table(evermore_nrc$sentiment)
## 
##        anger anticipation      disgust         fear          joy     negative 
##           43           44           25           63           45           89 
##     positive      sadness     surprise        trust 
##           76           44           25           50

Conclusion

#Future Work

Future Work